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Abstract 



We investigate the sequence-dependent properties of proteins that determine 
the dual requirements of stability of the native state and its kinetic accessibility 
using simple cubic lattice models. Three interaction schemes are used to 
describe the potentials between nearest neighbor non-bonded beads. We show 
that, under the simulation conditions when the native basin of attraction 
(NBA) is the most stable, there is an excellent correlation between folding 
times Tp and the dimensionless parameter cjt = {Tq — Tp) /Tq, where Tq is the 
collapse temperature and Tp is the folding transition temperature. There is 
also a significant correlation between Tp and another dimensionless quantity 
Z = {E]y — Ems)/^, where E'at is the energy of the native state, Ems is the 
average energy of the ensemble of misfolded structures, and 6 is the dispersion 
in the contact energies. An approximate relationship between ut and the Z- 
score is derived, which explains the superior correlation seen between Tp and 
gt- For two state folders Tp is linked to the free energy difference (not simply 
energy gap, however it is defined) between the unfolded states and the NBA. 



I. INTRODUCTION 

Natural proteins reach their native conformation in biologically relevant time scale of 
about a second or less starting from an ensemble of denatured conformations The native 
state of proteins is also stable (albeit marginally) under physiological conditions The 
underlying energy landscape of random sequences is far too rugged to be navigated 
in biologically relevant time scale. Thus, it is believed that protein sequences have evolved 
so that the dual requirements of stability and kinetic accessibility of their native states 
are simultaneously satisfied. An important question that arises from this observation is: 
What are the sequence dependent properties of proteins that govern their foldability? The 
sequences that satisfy the above stated dual requirements are considered to be foldable, and 
hence are biologically competent. This and related questions have attracted considerable 
theoretical attention over the last several years [@-|13|. Minimal protein models |0-^,|Tl],|13 



which capture some but not all the energetic balances in proteins, are particularly suited to 
provide a detailed answer to the question posed here. 

There have been three proposals in the literature, which have attempted to identify the 
characteristics of sequences that give some insight into the foldability. Below we briefly 
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describe the three criteria following the order of their appearance in the literature: 



By using the random energy model (REM) as a caricature of proteins it has been 
suggested P JT^ . |T5[| that foldable sequences have large values of Tp/Tg^eq where Tp is the 
folding transition temperature and eg is an equilibrium glass transition temperature, 
which in the original REM model is associated with the temperature at which the 
entropy vanishes. It has been subsequently realized that in order to utilize this criterion 



in lattice models Tg^eq has to be replaced by a kinetic glass transition temperature [116 



2. Theoretical considerations and lattice and off-lattice model simulations show that for 
optimal foldable sequences the collapse transition temperature is relatively close to Tp 
In other words, sequences that fold extremely rapidly have small values 



[1-0,113111 

of 



Tg-T, 

To 



(1) 



where Tg is the temperature at which the polypeptide chain makes a transition from the 
random coil state to a set of compact conformations. The characteristic temperatures 
Tg and Tp are equilibrium properties that can be altered by not only changing the 
external conditions, but also by mutations The collapse transition temperature 

is, in principle, measurable from the temperature dependence of the radius of gjnration, 
which can be measured using small angle X-ray scattering (or neutron scattering) 
experiments. We have shown that, depending on the values of ap, the very nature 
of the folding kinetics can be dramatically altered []TTip!7| . In particular sequences for 
which o"t ~ (here the process of collapse and the acquisition of the native state is 
indistinguishable) fold by two state kinetics. Such sequences are also stable over a 
wider variety of external solvent conditions. On the other hand, sequences, for which 
(Jt is relatively large, exhibit more complicated kinetics 0,|1O . 



3. Finally it has been argued that the "necessary and sufficient" conditions for a sequence 
to be foldable is that there be a large energy gap (with dimensions kcal/mol) or the 
native state be a "pronounced" minimum in energy [0. The validity of this criterion, 
even for lattice models, has been questioned in several articles |ri|, pr^ , p!^ , pO| , pl|] . 



In general, sequences which fold rapidly (small values of ax) are most easily generated 
by performing some sort of optimization in sequence space. One popular way of getting 
optimized sequences is to minimize the dimensionless Z-score defined as 



En — E„ 
5 



(2) 



where En is the energy of the native state. Ems is the average energy of the misfolded (or 
partially folded) states, and 6 is the dispersion in the contact energies. The purpose of 
this paper is to investigate if the folding rates are correlated with the Z-score in a manner 
similar to the correlation between folding times Tp and ap P JTl|JT^JT7|Jl8|1 . We show, using a 
database of several lattice models of proteins, that there is a significant correlation between 
Tp and the dimensionless quantity Z-score. The correlation, however, is not as strong as that 
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seen between tf and ar at least in these models. The rest of paper is organized as follows. In 
section II we present the models and the computational protocol. In section III the stability 
of the native state under the simulation conditions is established. The correlations between 
Tp and ctt and the Z'-score are also discussed. We also establish an appropriate relationship 
between ar and the Z-score. The paper is concluded in section IV with some additional 
remarks. 



II. METHODS 

A. Lattice Models of Proteins 

We model a protein sequence as a self-avoiding walk on a cubic lattice with the spacing 
a = 1 0J^. A conformation of a polypeptide chain is given by vectors {^j}, i = 1, 2...N. The 
value of for three sequences is 36 and for the remaining nineteen sequences, = 27. If, 
two nonbonded beads i and j > 3) are nearest neighbors on a lattice, i.e., |r^ — rjl = a, 

they form a contact. The energy of a conformation is given by the sum of interaction energies 
Bij associated with the contacts between beads 

E = Y.Ai\r--r-\-a)B,„ (3) 

i<j 

where A is unity, when — "r^l = a and is zero, otherwise. We have used three forms for 
the contact matrix elements Bij which mimic the diversity of interactions between various 
amino acids. Sixteen sequences with A^ = 27 in this study have the contact matrix elements 



Bij obtained from a Gaussian distribution ||TT 



where Bq is the average attraction interaction and the dispersion B gives the extent of 
diversity of the interactions among beads. The energies for these sequences are measured in 
terms of B which is set to unity; Bq is taken to be —0.1. We will refer to this interaction 
scheme as the random bond (RB) model. For three other sequences with N = 27 and 
one sequence with A^ = 36 Bij are taken from Table III of ref. ||2^. We will refer to this 
interaction scheme as the KGS model. A modified form of the Miyazawa-Jernigan potentials 
is used for two A^ = 36 sequences |^5|. We will denote this interaction scheme as the MJ 
model. 

Fifteen RB 27-mer sequences used in this study are taken from our previous work An 
additional RB 27-mer sequence was included in our database during the course of this work 
to expand the range of cr-r values. Of the sixteen sequences nine have maximally compact 
native states, while the remaining seven sequences have non-compact native structures. 
Three KGS 27-mer sequences have identical maximally compact native structures. Similarly 
three 36-mer sequences have identical maximally compact native conformations. The native 
conformation of 36-mer sequences is shown in Fig. (la). 



For each sequence we perform Monte Carlo simulations (for details, see ref. [|rT|) and 



determined Tg and Tp using multiple histogram technique p6[, which is described in the 
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context of protein folding elsewhere [^,^. Briefly, Tg is associated with the peak of the spe- 
cific heat as a function of temperature. Such estimates of Tg coincide with the peak in the 
derivative of the temperature dependence of the radius of gyration < Rg > [^]. The folding 
transition temperature is obtained from the peak of the fluctuations in the overlap function, 
Ax 0. These methods have been successfully used to obtain the two characteristic equi- 
librium temperatures for lattice, off-lattice, and all-atom models of proteins P, p!Tl , p^ , [28| , |29| . 
In Fig. (lb) we show the temperature dependence of Ax, C^, and d < Rg > /dT > for the 
sequence whose native conformation is shown in Fig. (la). From the peaks of these plots 
we get Tp = 0.80 and Tg = 1.14, so that ctt (see Eq. (0)) for this sequence is 0.30. 

We also computed the Z-score (see Eq. (^) for the twenty two sequences. The values of 
Ems were calculated as Ems = c < B >, where c is the number of contacts in the misfolded 
structures and < -B > is the average contact energy for a particular sequence. The dispersion 
S is determined from S"^ =< > — < B >^, where < -B^ > is the average of the square 
of contact energies. In general, c is equal to the number of contacts in the native state 
which for maximally compact structures is 28 for N = 27 and 40 for = 36. The kinetic 



simulations are done at sequence dependent temperatures [11,19|, which were determined 



by the condition < xi^s) >= ct. This criterion for choosing Ts allows several sequences to 
be compared on equal footing regardless of topology and the nature of interaction potentials 
used. The value of a = 0.21 is chosen so that Tg < Tp for all sequences. This ensures 
that the native conformation or more precisely the native basin of attraction is the most 
dominant at T = T^. 

The folding times are calculated from the time dependence of the fraction of unfolded 
molecules, P„(t) |T^. The function Pu(t) may be computed from the distribution of first 
passage times. Operationally for every sequence an ensemble of initial denatured confor- 
mations (obtained at T > Tg) is generated. For each initial condition the temperature is 
reduced to Ts and the dynamics is followed till the first passage time is reached. Typically 
we generated between 200 to 500 independent trajectories in order to reduce the statistical 
error in determining Tp to about 5%. 



III. RESULTS 

(a) Stability of the native basin of attraction: Due to the discrete nature of spatial 
and energetic representation the native state of lattice models of proteins is a single mi- 
crostate. In the coarse grained energy representation every term in the energy function has 
Ising like discreteness (see Eq. (^). Since these models represent a coarse grained caricature 
of proteins it is useful to define a native basin of attraction (NBA) . This is necessary because 
the idea that the native state is a single microstate is clearly unphysical. The native basin 
of attraction has a volume associated with it. The larger such a volume is the smoother 
one expects the underlying energy landscape to be. The probability of being in the NBA is 
defined as |^ 



kT 



p (rp^ KXi < Xnba) exp- 

Rnba[i ) = (5j 

J2 exp kT 

where Xnba is the value of the overlap function at the folding transition temperature Tp, 
Ei is the energy of the conformation i, and Xi is the corresponding value of the overlap. The 
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overlap function is defined as PJTT 



N^-3N + 2 .J^J^'--'S)^ (6) 

where r^j is the distance between the i and j beads and rfj is the distance between the same 
beads in the native conformation. According to Eq. (j^) all conformations with overlaps less 
than Xnba map onto the NBA, which implies that a steepest descent quench would directly 
lead these conformations to the NBA. The above definition of NBA is physically appealing. 
For all the sequences Tp, obtained from the peak of fluctuations in the overlap function, 
nearly coincides with Tp determined using PNBAiTp) =0.5. 

In order to demonstrate the stability of the NBA we have calculated Pnba{Ts) for all 
the sequences. Recall that the sequence dependent temperatures at which the simulations 
are performed are chosen so that < xiTg) >= 0.21. In Fig. (2a) we show Pnba(Ts) for the 
twenty two sequences. For all the sequences the probability of being in the NBA exceeds 0.5, 
which implies that for the simulation conditions employed here the stability of the native 
state is automatically ensured. Among the nineteen 27-mer sequences fifteen are exactly the 



same ones as reported in our previous work [11 1, and the temperatures of the simulations 



are identical to those used in our earlier studies. Thus, even in our earlier work the stability 
of the NBA at the simulation temperatures has been guaranteed. 

(b) Dependence of folding times on ctt'- The folding times for the 22 sequences con- 



sidered here have been computed using methods described in detail elsewhere |T^. In Fig. 
(2b) we plot the dependence of Tp on ap- This figure clearly shows that the folding times 
correlate extremely well with ap under conditions when the NBA is stable (see Fig. (2a)). A 
relatively small change in ap can lead to a dramatic increase in folding times. For example, 
an increase in ap by a factor of four results in three orders of magnitude increase in Tp. This 
figure clearly shows that the dual requirements of thermodynamic stability of the NBA and 
kinetic accessibility of the NBA are satisfied for the sequences with relatively small values 
of ap- This verifies the foldability principle which states that fast folding sequences with 
stable native states have Tp ^ Tg [pO| . 

There are two important points concerning the results presented in Fig. (2b): (1) The 
excellent correlation shown in Fig. (2b) should be considered as statistical i.e., we expect 
to find such dependence of Tp on ap only if a number of sequences over a range of ap is 
examined. This implies that it is not possible to predict the relative rates of folding for 
sequences whose ap values are close. Such sequences are expected to fold on similar time 
scales. (2) Foldable sequences with small values of ap reach the NBA over a wider range of 
external conditions than those with moderate values of ap. Since not all naturally occurring 
proteins fold rapidly it follows that there are proteins with moderate values of ap that reach 
their NBA by complex kinetics [§,|l3]- This involves, in addition to the direct pathway to 
the native state, off-pathway processes involving intermediates. Such sequences reach the 
NBA by a kinetic partitioning mechanism [p!0| , pO|] . 

(c) Relationship between Tp and Z-score: The folding times as a function of the 
Z-score for our database of sequences are shown in Fig. (3a). It is clear that there is a 
significant correlation between the two. However, the correlation here is not as good as that 
in Fig. (2b). The plausible reasons are given in the next subsection in which the relationship 
between Z-score and ap is explored. It is tempting to think that because the numerator of 



5 



Z-score is some measure of the so called stability gap one can conclude that folding rates 
are linked just to -Eat — Ems- We show below that this is not the case, 
(d) Relationship between Z-score and 0"^: The significant correlations between the 
folding times and ax and Z-score suggest that there might be a relationship between these 
two dimensionless quantities. We arrive at an approximate relationship between the two 
which also explains the reasons for the superior correlation between Tp and ut- 

The rationale for using o"t as a natural criterion that satisfies the dual requirements of 



stability and kinetic accessibility is the following |3y]. The transition from compact states 
to the native state at Tp is usually first order, and neglecting the entropy associated with 
the native state Tp is approximately given by Wl 



\6E. 



SG\ 



S, 



NN 



(7) 



where SEsg is roughly the stability gap and Snn is the entropy of the (non-native) states 
whose average energy is roughly I^-EsgI above the NBA. If there is considerable entropy 
associated with the NBA this has to be subtracted from the denominator of Eq. (|^. Since 
our arguments do not really depend on this, we ignore it here. The transition from the 
random coil states to the collapsed states occurs at Tg ^ D/ks, where D, is the driving 
force, that places the hydrophobic residues in the core and the polar residues in the exterior 
creating an interface between the compact molecule and water. 

The entropy of the intervening non-native states is a function of D. Consider the case 
of large D. In this case the polypeptide chain will undergo a non-specific collapse into one 
of the exponentially large number of compact conformations. This renders Snn extensive 
in A^, where A^ is the number of amino acid residues in the polypeptide chain. This makes 
Tp very low even for moderate sized proteins. In the opposite limit, when D is small, there 
is not enough driving force to create a compact structure. Here again Snn is extensive 
and as a result Tp becomes low. Thus an optimum value of D, which reflects a proper 
balance of local interactions (leading to secondary structures) and long range interactions 
(causing compaction and formation of tertiary structure) is necessary so that Snn be 



small enough. This would make Tp as large as possible without exceeding the bound Tg. 
Thus, optimizing 6Esg, S^n, and D (hence Tq) leads to small values of ap, which therefore 
emerges as a natural parameter that determines the folding rates and stability. 

Consider the spectrum of states of protein-like heteropolymers. It has been suggested, 
using computational models |^ and theoretical arguments |T5[, that generically the spec- 



trum of states consists of the NBA separated from the non-native states by ^E^^. Above the 
manifold of non-native states one has the ensemble of random-coil conformations. A lower 
limit of the energy separating the random coil conformations and the non-native compact 
structures is 6, which is the dispersion in the energy of the non-native states. (In lattice 
models 6 is associated with the dispersion in the contact energies). Thus /csTg > 6. Assum- 
ing[| that the density of non-native structures in the energy range —6/2 < E — 6 Esc < 5/2 



^Notice that the arguments do not depend on the precise form of the density of states. All 
we require is that the density of misfolded states has the functional dependence so that Snn = 



6 



is Qnn — {E — 5Esg)" (where a is an even integer) we get Snn ~ ksln^S/So), where Sq is 
a sequence dependent constant. From these arguments it follows that 

Tf \Z\ \6Esg\ 1 /„\ 

kB So 

if 6EsG is identified with E^ — E^s- Thus, maximizing the ratio Tp/Tg (or minimizing ax) 
is approximately equivalent to minimizing the ratio Z/(S'ArAr/fcs). It is perhaps the neglect 
of the entropy of the non-native states that leads to the poorer correlation between tf and 
Z-score as compared to correlation between Tp and (Tt (see Figs. (2b) and (3a)). 
(e) Linking Tp to various definitions of the energy gap: Since the numerator of the 
Z-score is an estimate of SEsg, it might be tempting to conclude that there is relationship 
between Tp and the associated energy gap. In the context of minimal models of proteins 
a number of definitions of the "energy gap" have been proposed. It is useful to document 
these definitions: 

1. Standard Energy Gap: The time honored definition of the energy gap for any system 
(not consisting of fermions) is Ap = Ei\f — Ei, where E^- is the energy of the ground 
(native) state and Ei is that of the first excited state. This definition is usually deemed 
inappropriate for protein-like lattice models because a flip of one of the beads can lead 
to a trivial structural change especially for non-compact native states. Such structures 
would belong to the NBA (see Eq. (|])). In real models in water, even a casual flip 
of a residue could involve substantial solvent rearrangements, resulting in significant 
energy (or enthalpy) penalty. 



Compact Energy Gap: The compact energy gap is Acs = E^^ — E^^ [|T^, where E'^^ 
and E*^^ are the energies of the native state and the first excited state respectively. 
The superscript CS indicates that these conformations are restricted to the ensemble 
of maximally compact conformations. It has been shown that the ground states of 



many sequences are non-compact . Furthermore, in several cases the lowest energy 



of the maximally compact conformation is much greater than those of the manifold of 



non-compact non-native structures [11 1. The correlation between Tp and Acs is, at 



best, poor (see Fig. (22) of ref. ||1 1|| ) 



3. Stability gap: The notion that 5Esg should play an important role in determining 
both the stability and the folding rates is based on sound physical arguments 0]. We 
believe that its close relation to Tp makes the stability gap a very useful physical 
concept. 

4. Z -score gap: This is defined as = -Ea? — Ems-, which is the numerator of Eq. (^. 
This is closely related to SEsg, and for practical purposes may be identical. The precise 
value for Ems depends on the given sequence, and even the practitioner. The great 



kslnilNN be positive. For example, if i^NN{E) ~ exp{E — 6Esg), then Snn ~ kBln[sh(6/26o)], 
where Sq is a suitable constant. 
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utility of the Z-score is that one can use it as a technical device to assess the efficiency 
of threading algorithm or for generating sequences that are good folders [^,^. For 
these two purposes the precise values of E^s do not appear to be very important. 
Since Ems can be altered freely the definition of Az is somewhat ambiguous. In this 
article we have used the same definition of Ems for all sequences, and this allows us to 
assess the efficiency of the Z-score in determining folding kinetics. 

We have tested the relationship between Tp and Az using the database of 22 sequences. 
A plot of the folding time for the 22 sequences as a function of Az is given in Fig. 
(3b). We see a very poor correlation between Tp and Az for all sequences included 
in our database. There appears to be a link between Tp and Az for the three 27-mer 
sequences with the KGS potentials but none for the three 36-mer sequences. The 
number of sequences with the KGS interaction scheme and the modified MJ scheme is 
too small for meaningful trend to be established. However, it is clear that the overall 
correlation between Tp and is poor. 

(e) Probing the correlation between Tp and the free energy of stability: The 

various energy gaps described above do not adequately correlate with Tp. A plausible reason 
could be that the energy gaps ignore the entropy of the chain in the denatured states. Here, 
we explore the idea that the free energy of stability of the native state itself could be an 
indicator of foldability. Consider a large number of two state folders. In this case only 
the NBA and the ensemble of unfolded states which have very little overlap with the native 
state are significantly populated. From a physical point of view, the appropriate equilibrium 
quantity that could correlate with the folding rates is the free energy difference between the 
two states. We consider the free energy of stability defined for two state folders (with small 
values of ar) as 

AFu-N = -kBTslnK{Ts) (9) 

with the equilibrium constant 

K[Ts) = TT^TT 

i — ^NBA\-Ls) 

where Tg is the simulation temperature and Pnea^T) is given in Eq. (||). In experiments 
AFu-N should be replaced by AGhzO which gives the stability of the native state in the 
limit of zero denaturant concentration. 

In order to examine the dependence of Tp on AFu-n we singled out the two state folders 
from our database. For these sequences we computed AFu^n using Eq. (^. In Fig. (4) 
we plot Tp as a function of AFu^n. We do find important correlation which approaches the 
quality of that shown in Fig. (2b). For the two state folders it is clear that AFu_i^ is a 
good estimate of fc^Tp and hence Tg (since ap is small). Thus the correlation seen in Fig. 
(4) is not entirely unexpected. 

IV. CONCLUSIONS 

The variations in folding times for a variety of sequences under conditions when the 
native basin of attraction is the most populated can be understood in terms of ap ( see Eq. 
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(0)) ||TO|JTT[] . Thus, the simultaneous requirements of thermodynamic stabihty and kinetic 
accessibiUty are satisfied for sequences for which ax is small. Such sequences are foldable 
over a broad range of external conditions. 

It might be tempting to conclude that computation of Tg and Tp for lattice models 
requires exhaustive simulations. This is not the case. In order to get reasonably accurate 
estimate of Tg and Tp by multiple histogram method we find that, for most sequences, 
between 8 to 10 trajectories each with about 50 millions of Monte Carlo steps are sufficient. 
By comparison, reliable determination of folding kinetics time scales requires a few hundred 
trajectories at various temperatures. Thus, ap is a useful criterion for designing fast folding 
sequences. By contrast, notice that when a Z-score optimized sequence is generated, its 
thermodynamics (as well as kinetics) is a priori unknown. A separate set of simulations has 
to be performed at various temperatures in order to obtain its thermodynamics. 

There is a significant correlation between Z-score and the rates of folding. This correla- 
tion is not nearly as good as the one between Tp and ap. The connection between Tp and 
the Z-score suggests that this could arise because the entropy of the non-native states (or 
more precisely the entropy difference between the non-native states and the native basin of 
attraction) is not taken into account in the Z-score. More importantly, the Z-score does not 
appear to be easily measurable making its experimental validation difficult, if not impossible. 

There appears to be no useful predictive relationship between the various energy gaps and 
the folding times. In seeking a correlation involving energetics and entropy of the unfolded 
and folded states we have found that for two state folders the free energy of stability of the 
native state with respect to unfolded states correlates well with the folding times. Note that 
this quantity includes the entropies of the NBA and the unfolded states. The correlation 
between the folding time and ap, and with the free energy of stability for two state folders 
can be verified experimentally. 
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FIGURES 



Fig. (1) (a) The conformation of the native state of a 36-mer MJ sequence. This sequence 
is SQKWLERGATRIADGDLPVNGTYFSCKIMENVHPLA, where we have used 
the one letter representation of the amino acids. This conformation is the lowest energy con- 
formation in the native basin of attraction, (b) Temperature dependence of the fluctuations 
in the overlap function Ax (solid line), specific heat (dotted line), and the derivative 
of the radius of gyration with respect to temperature d < Rg > jdT (dashed line) for the 
sequence whose native state is displayed in Fig. (la). The scale for and d < Rg > /dT is 
given on the right. The collapse temperature Tg, obtained from the larger peak of specific 
heat Cjj curve, is found to be 1.14. It is seen that Tg is very close to the temperature at 
which d < Rg > /dT reaches maximum (at 1.19). The two peaks in d < Rg > /dT, with 
the smaller one coinciding with the location of the maximum in Ax, suggest that from a 
thermodynamic viewpoint a three state description is more appropriate for this sequence. 
The value of Tp, which is associated with the peak of Ax, is 0.80. Therefore, for this se- 
quence collapse and folding transition temperatures are separated by a large interval, and ur 
(=0.30) is consequently large. For this sequence the value of Tp obtained from the condition 
Pnba{Tp) = 0.5 is 0.79, which nearly coincides with the peak position of Ax. In majority 
of the sequences we only observe one peak in and d < Rg > /dT . Hence, it is necessary 
to introduce independent order parameters to determine Tp. 

Fig. (2) (a) The values of the probability of being in the native basin of attraction Pnba 
at the sequence dependent simulation temperatures Tg for the database of 22 sequences 
considered in this study. The horizontal dotted line corresponds to Pnba = 0.5. This 
figures shows that at the simulation temperatures Pnba exceeds 0.5 which implies that the 
stability criterion is automatically satisfied, (b) Plot of the folding times Tp as a function 
of ax for the 22 sequences. This figures shows that under the external conditions when 
the NBA is the most populated there is a remarkable correlation between rp and ap- The 
correlation coefficient is 0.94- It is clear that over a four orders of magnitude of folding 
times Tp ^ exp(— ctt/cto) where o"o is a constant. In both panels the filled and open circles 
are for the RB and KGS 27-mer models, respectively. The open squares are for = 36. 
Fig. (3) (a) The dependence of rp on the Z-score. There is a significant correlation 
between the folding times and the Z-score. Since the scales for the Z-score depend on both 
the interaction scheme and the length of the sequence it is hard to fit the data for all 22 
sequences. If we restrict ourselves to the 16 sequences in the RB model, we find that the 
correlation coefficient is 0.70, which is not nearly as good as in Fig. (2b). (b) Plot of rp as 
function of A^ which is the energy gap that appears in the numerator of Eq. (^. This figure 
clearly shows that there is no correlation between rp and the A^. The various symbols are 
the same as in Fig. (2b). 

Fig. (4) This figure shows, for the two state folders only, the dependence of rp on the free 
energy of stability AFu^^ of the NBA with respect to denatured states. Notice that AFu-n 
is not an energy gap. It includes the entropies of the folded and unfolded states explicitly 
and is obtained from the equilibrium constant between the unfolded states and the NBA at 
the simulation temperature Tg. 
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